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THERMAL TOLERANT CELLULASE FROM ACIDOTHERMUS 

CELLULOLYTICUS 

Government Interests 

5 The United States Government has rights in this invention under Contract 

No. DE-AC36-99G010337 between the United States Department of Energy and the 
National Renewable Energy Laboratory, a Division of the Midwest Research 
Institute. 

Field of the Invention 

10 The invention generally relates to a novel cellulase from Acidothermus 

cellulolyticus, GuxA. More specifically, the invention relates to purified and 
isolated GuxA polypeptides, nucleic acid molecules encoding the polypeptides, and 
processes for production and use of GuxA, as well as variants and derivatives 
thereof. 

15 Background of the Invention 

Plant biomass as a source of energy production can include agricultural and 
forestry products, associated by-products and waste, municipal solid waste, and 
industrial waste. In addition, over 50 million acres in the United States are currently 
available for biomass production, and there are a number of terrestrial and aquatic 

20 crops grown solely as a source for biomass (A Wiselogel, et al. Biomass feedstocks 
resources and composition. In CE Wyman, ed. Handbook on Bioethanol: Production 
and Utilization. Washington, DC: Taylor & Francis, 1996, pp 105-118). Biofuels 
produced from biomass include ethanol, methanol, biodiesel, and additives for 
reformulated gasoline. Biofuels are desirable because they add little, if any, net 

25 carbon dioxide to the atmosphere and because they greatly reduce ozone formation 
and carbon monoxide emissions as compared to the environmental output of 

■ 

conventional fuels. (P Bergeron. Environmental impacts of bioethanol. In CE 
Wyman, ed. Handbook on Bioethanol: Production and Utilization. Washington, DC: 
Taylor & Francis, 1996, pp 90-103). 
30 Plant biomass is the most abundant source of carbohydrate in the world due to 

the lignocellulosic materials composing the cell walls of all higher plants. Plant cell 
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walls are divided into two sections, the primary and the secondary cell walls. The 
primary cell wall, which provides structure for expanding cells (and hence changes 
as the cell grows), is composed of three major polysaccharides and one group of 
glycoproteins. The predominant polysaccharide, and most abundant source of 
5 carbohydrates, is cellulose, while hemicellulose and pectin are also found in 
abundance. Cellulose is a linear beta-(l,4)-D-glucan and comprises 20% to 30% of 
the primary cell wall by weight. The secondary cell wall, which is produced after the 
cell has completed growing, also contains polysaccharides and is strengthened 
through polymeric lignin covalently cross-linked to hemicellulose. 

10 Carbohydrates, and cellulose in particular, can be converted to sugars by well- 

known methods including acid and enzymatic hydrolysis. Enzymatic hydrolysis of 
cellulose requires the processing of biomass to reduce size and facilitate subsequent 
handling. Mild acid treatment is then used to hydrolyze part or all of the hemicellulose 
content of the feedstock. Finally, cellulose is converted to ethanol through the 

15 concerted action of cellulases and saccharolytic fermentation (simultaneous 
saccharification fermentation (SSF)). The SSF process, using the yeast 
Saccharomyces cerevisiae for example, is often incomplete, as it does not utilize the 
entire sugar content of the plant biomass, namely the hemicellulose fraction. 

The cost of producing ethanol from biomass can be divided into three areas 

20 of expenditure: pretreatment costs, fermentation costs, and other costs. Pretreatment 
costs include biomass milling, pretreatment reagents, equipment maintenance, power 
and water, and waste neutralization and disposal. The fermentation costs can 
include enzymes, nutrient supplements, yeast, maintenance and scale-up, and waste 
disposal. Other costs include biomass purchase, transportation and storage, plant 

25 labor, plant utilities, ethanol distillation, and administration (which may include 
technology-use licenses). One of the major expenses incurred in SSF is the cost of 
the enzymes, as about one kilogram of cellulase is required to fully digest 50 
kilograms of cellulose. Economical production of cellulase is also compounded by 
factors such as the relatively slow gowth rates of cellulase-producing organisms, 

30 levels of cellulase expression, and the tendency of enzyme-dependent processes to 
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partially or completely inactivate enzymes due to conditions such as elevated 
temperature, acidity, proteolytic degradation, and solvent degradation. 

Enzymatic degradation of cellulose requires the coordinate action of at least 
three different types of cellulases. Such enzymes are given an Enzyme Commission 
5 (EC) designation according to the Nomenclature Committee of the International 
Union of Biochemistry and Molecular Biology (Eur. J. Biochem. 264: 607-609 and 
610-650, 1999). Endo- beta-(l,4)-glucanases (EC 3.2.1.4) cleave the cellulose 
strand randomly along its length, thus generating new chain ends. Exo- beta-(l,4)- 
glucanases (EC 3.2.1.91) are processive enzymes and cleave cellobiosyl units (beta- 

10 (l,4)-glucose dimers) from free ends of cellulose strands. Lastly, beta-D- 
glucosidases (cellobiases: EC 3.2.1.21) hydrolyze cellobiose to glucose. All three of 
these general activities are required for efficient and complete hydrolysis of a 
polymer such as cellulose to a subunit, such as the simple sugar, glucose. 

Highly thermostable enzymes have been isolated from the cellulolytic 

15 thermophile Acidothermus cellulolyticus gen. nov. 7 sp. nov., a bacterium originally 
isolated from decaying wood in an acidic, thermal pool at Yellowstone National Park. 
A. Mohagheghi et al., (1986) Int. J. Systematic Bacteriology., 36(3): 435-443. One 
cellulase enzyme produced by this organism, the endoglucanase EI, is known to 
display maximal activity at 75 °C to 83°C. M.P. Tucker et al. (1989), 

20 Bio/Technology, 7(8): 817-820. El endoglucanase has been described in U.S. Patent 
5,275,944. The A. cellulolyticus El endoglucanase is an active cellulase; in 
combination with the exocellulase CBH I from Trichoderma reesei, El gives a high 
level of saccharification and contributes to a degree of synergism. Baker JO et al. 
(1994), Appl. Biochem. Biotechnol .. 45/46: 245-256. The gene coding EI catalytic 

25 and cellulose binding domains and linker peptide were described in U.S. Patent 
5,536,655. El has also been expressed as a stable, active enzyme from a wide 
variety of hosts, including E, coli, Streptomyces lividans, Pichia pastoris, cotton, 
tobacco, and Arabidopsis (Dai Z, Hooker BS, Anderson DB, Thomas SR. 
Transgenic Res. 2000 Feb; 9(l):43-54). 

30 There is a need within the art to generate alternative cellulase enzymes 

capable of assisting in the commercial-scale processing of cellulose to sugar for use 
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in biofuel production. Against this backdrop the present invention has been 
developed. 

The potential exists for the successful, commercial-scale expression of 
heterologous cellulase polypeptides, and in particular novel cellulase polypeptides 
5 with or without any one or more desirable properties such as thermal tolerance, and 
partial or complete resistance to extreme pH inactivation, proteolytic inactivation, 
solvent inactivation, chaotropic agent inactivation, oxidizing agent inactivation, and 
detergent inactivation. Such expression can occur in fungi, bacteria, and other hosts. 

10 Summary of the Invention 

The present invention provides GuxA, a novel member of the glycoside 
hydrolase (GH) family of enzymes, and in particular a thermal tolerant glycoside 
hydrolase useful in the degradation of cellulose. GuxA polypeptides of the invention 
include those having an amino acid sequence shown in SEQ ID NO:l, as well as 

15 polypeptides having substantial amino acid sequence identity to the amino acid 
sequence of SEQ ID NO:l and useful fragments thereof, including, a first catalytic 
domain having significant sequence similarity to the GH6 family, a second catalytic 
domain having significant sequence similarity to the GH12 family, a first cellulose 
binding domain (type II) and a second cellulose binding domain (type HI). 

20 The invention also provides a polynucleotide molecule encoding GuxA 

polypeptides and fragments of GuxA polypeptides, for example catalytic and 
cellulose binding domains. Polynucleotide molecules of the invention include those 
molecules having a nucleic acid sequence as shown in SEQ ID NO:2; those that 
hybridize to the nucleic acid sequence of SEQ ID NO: 2 under high stringency 

25 conditions; and those having substantial nucleic acid identity with the nucleic acid 
sequence of SEQ ID NO:2. 

The invention includes variants and derivatives of the GuxA polypeptides, 
including fusion proteins. For example, fusion proteins of the invention include 
GuxA polypeptide fused to a heterologous protein or peptide that confers a desired 

30 function. The heterologous protein or peptide can facilitate purification, 
oligomerization, stabilization, or secretion of the GuxA polypeptide, for example. 
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As further examples, the heterologous polypeptide can provide enhanced activity, 
including catalytic or binding activity, for GuxA polypeptides, where the 
enhancement is either additive or synergistic. A fusion protein of an embodiment of 
the invention can be produced, for example, from an expression construct containing 
5 a polynucleotide molecule encoding GuxA polypeptide in frame with a 
polynucleotide molecule for the heterologous protein. Embodiments of the 
invention also comprise vectors, plasmids, expression systems, host cells, and the 
like, containing a GuxA polynucleotide molecule. Genetic engineering methods for 
the production of GuxA polypeptides of embodiments of the invention include 

10 expression of a polynucleotide molecule in cell free expression systems and in 
cellular hosts, according to known methods. 

The invention further includes compositions containing a substantially 
purified GuxA polypeptide of the invention and a carrier. Such compositions are 
administered to a biomass containing cellulose for the reduction or degradation of 

15 the cellulose. 

The invention also provides reagents, compositions, and methods that are 
useful for analysis of GuxA activity. 

These and various other features as well as advantages which characterize the 
present invention will be apparent from a reading of the following detailed 
20 description and a review of the associated drawings. 

The following Tables 5 and 6 includes sequences used in describing 
embodiments of the present invention. In Table 5, the abbreviations are as follows: 
CD, catalytic domain; CBD_H, carbohydrate binding domain type II; CBD_ni, 
carbohydrate binding domain type HI; and FN-IH, fibronectin domain type HI. When 
25 used herein, N* indicates a string of unknown nucleic acid units, and X* indicates a 
string of unknown amino acid units, for example about 50 or more. Table 5 includes 
approximate start and stop information for segments, and Table 6 includes amino 
acid sequence data for segments. 

30 
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Brief Description of the Drawings 

FIG. 1 is a schematic representation of the gene sequence and amino acid 
segment organization. 
5 FIG 2 is a graphic representation of the glycoside hydrolase gene/protein 

families found in various organisms. 

Detailed Description 

Definitions: 

The following definitions are provided to facilitate understanding of certain 
10 terms used frequently herein and are not meant to limit the scope of the present 
disclosure: 

"Amino acid" refers to any of the twenty naturally occuring amino acids as 
well as any modified amino acid sequences. Modifications may include natural 
processes such as posttranslational processing, or may include chemical 
15 modifications which are known in the art. Modifications include but are not limited 
to: phosphorylation, ubiquitination, acetylation, amidation, glycosylatioin, covalent 

attachment of flavin, ADP-ribosylation, cross linking, iodination, methylation, and 

» 

alike. 

"Antibody" refers to a Y-shaped molecule having a, pair of antigen binding 
20 sites, a hinge region and a constant region. Fragments of antibodies, for example an 
antigen binding fragment (Fab), chimeric antibodies, antibodies having a human 
constant region coupled to a murine antigen binding region, and fragments thereof, 
as well as other well known recombinant antibodies are included in the present 
invention. 

25 "Antisense" refers to polynucleotide sequences that are complementary to 

target "sense" polynucleotide sequence. 

"Binding activity" refers to any activity that can be assayed by characterizing 
the ability of a polypeptide to bind to a substrate. The substrate can be a polymer 
such as cellulose or can be a complex molecule or aggregate of molecules where the 

30 entire moiety comprises at least some cellulose. Note that when used herein the 
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terms cellulose binding domain (CBD) and carbohydrate binding domain are used 
interchangeably. 

"Cellulase activity" refers to any activity that can be assayed by 
characterizing the enzymatic activity of a cellulase. For example, cellulase activity 
5 can be assayed by determining how much reducing sugar is produced during a fixed 
amount of time for a set amount of enzyme (see Irwin et al., (1998) J. Bacteriology, 
1709-1714). Other assays are well known in the art and can be substituted. 

"Complementary" or "complementarity" refers to the ability of a 
polynucleotide in a polynucleotide molecule to form a base pair with another 

10 polynucleotide in a second polynucleotide molecule. For example, the sequence A- 
G-T is complementary to the sequence T-C-A. Complementarity may be partial, in 
which only some of the polynucleotides match according to base pairing, or 
complete, where all the polynucleotides match according to base pairing. 

"Expression" refers to transcription and translation occurring within a host 

15 cell. The level of expression of a DNA molecule in a host cell may be determined 
on the basis of either the amount of corresponding mRNA that is present within the 
cell or the amount of DNA molecule encoded protein produced by the host cell 
(Sambrook et al., 1989, Molecular cloning: A Laboratory Manual, 18.1-18.88). 

"Fusion protein" refers to a first protein having attached a second, 

20 heterologous protein. Preferably, the heterologous protein is fused via recombinant 
DNA techniques, such that the first and second proteins are expressed in frame. The 
heterologous protein can confer a desired characteristic to the fusion protein, for 
example, a detection signal, enhanced stability or stabilization of the protein, 
facilitated oligomerization of the protein, or facilitated purification of the fusion 

25 protein. Examples of heterologous proteins useful in the fusion proteins of the 
invention include molecules having one or more catalytic domains of GuxA, one or 
more binding domains of GuxA, one or more catalytic domains of a glycoside 
hydrolase other than GuxA, one or more binding domains of a glycoside hydrolase 
other than GuxA, or any combination thereof. Further examples include 

30 immunoglobulin molecules and portions thereof, peptide tags such as histidine tag 
(6-His), leucine zipper, substrate targeting moieties, signal peptides, and the like. 
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Fusion proteins are also meant to encompass variants and derivatives of GuxA 
polypeptides that are generated by conventional site-directed mutagenesis and more 
modern techniques such as directed evolution, discussed infra. 

"Genetically engineered" refers to any recombinant DNA or RNA method 
5 used to create a prokaryotic or eukaryotic host cell that expresses a protein at 
elevated levels, at lowered levels, or in a mutated form. In other words, the host cell 
has been transfected, transformed, or transduced with a recombinant polynucleotide 
molecule, and thereby been altered so as to cause the cell to alter expression of the 
desired protein. Methods and vectors for genetically engineering host cells are well 

10 known in the art; for example various techniques are illustrated in Current Protocols 
in Molecular Biology, Ausubel et al., eds. (Wiley & Sons, New York, 1988, and 
quarterly updates). Genetically engineering techniques include but are not limited to 
expression vectors, targeted homologous recombination and gene activation (see, for 
example, U.S. Patent No. 5,272,071 to Chappel) and trans activation by engineered 

15 transcription factors (see, for example, Segal et al., 1999, Proc Natl Acad Sci USA 
96(6):2758-63). 

"Glycoside hydrolase family" refers to a family of enzymes which hydrolyze 
the glycosidic bond between two or more carbohydrates or between a carbohydrate 
and a non-carbohydrate moiety (Henrissat B., (1991) Biochem. J., 280:309-316). 

20 Identification of a putative glycoside hydrolase family member is made based on an 
amino acid sequence comparison and the finding of significant sequence similarity 
within the putative member's catalytic domain, as compared to the catalytic domains 
of known family members. 

"Homology" refers to a degree of complementarity between polynucleotides, 

25 having significant effect on the efficiency and strength of hybridization between 
polynucleotide molecules. The term also can refer to a degree of similarity between 
polypeptides. 

"Host cell" or "host cells" refers to cells expressing a heterologous 
polynucleotide molecule. Host cells of the present invention express 
30 polynucleotides encoding GuxA or a fragment thereof. Examples of suitable host 
cells useful in the present invention include, but are not limited to, prokaryotic and 
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eukaryotic cells. Specific examples of such cells include bacteria of the genera 
Escherichia, Bacillus, and Salmonella, as well as members of the genera 
Pseudomonas, Streptomyces, and Staphylococcus; fungi, particularly filamentous fungi 
such as Trichoderma and Aspergillus, Phanerochaete chrysosporium and other white 
5 rot fungi; also other fungi including Fusaria, molds, and yeast including 
Saccharomyces sp., Pichia sp., and Candida sp. and the like; plants e.g. Arabidopsis, 
cotton, barley, tobacco, potato, and aquatic plants and the like; SF9 insect cells 
(Summers and Smith, 1987, Texas Agriculture Experiment Station Bulletin, 1555), 
and the like. Other specific examples include mammalian cells such as human 

i 

10 embyonic kidney cells (293 cells), Chinese hamster ovary (CHO) cells (Puck et al., 
1958, Proc. Natl. Acad. Sci. USA 60, 1275-1281), human cervical carcinoma cells 
(HELA) (ATCC CCL 2), human liver cells (Hep G2) (ATCC HB8065), human 
breast cancer cells (MCF-7) (ATCC HTB22), human colon carcinoma cells (DLD-1) 
(ATCC CCL 221), Daudi cells (ATCC CRL-213), murine myeloma cells such as 

15 P3/NSI/l-Ag4-l (ATCC TEB-18), P3X63Ag8 (ATCC TIB-9), SP2/0-Agl4 (ATCC 
CRL-1581) and the like. 

"Hybridization" refers to the pairing of complementary polynucleotides 
during an annealing period. The strength of hybridization between two 
polynucleotide molecules is impacted by the homology between the two molecules, 

20 stringency of the conditions involved, the melting temperature of the formed hybrid 
and the G:C ratio within the polynucleotides. 

"Identity" refers to a comparison between pairs of nucleic acid or amino acid 
molecules. Methods for determining sequence identity are known. See, for 
example, computer programs commonly employed for this purpose, such as the Gap 

25 program (Wisconsin Sequence Analysis Package, Version 8 for Unix, Genetics 
Computer Group, University Research Park, Madison Wisconsin), that uses the 
algorithm of Smith and Waterman, 1981, Adv. Appl Math., 2: 482-489. 

"Isolated" refers to a polynucleotide or polypeptide that has been separated 
from at least one contaminant (polynucleotide or polypeptide) with which it is 

30 normally associated. For example, an isolated polynucleotide or polypeptide is in a 
context or in a form that is different from that in which it is found in nature. 
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"Nucleic acid sequence" refers to the order or sequence of 
deoxyribonucleotides along a strand of deoxyribonucleic acid. The order of these 
deoxyribonucleotides determines the order of amino acids along a polypeptide chain. 
The deoxyribonucleotide sequence thus codes for the amino acid sequence. 
5 "Polynucleotide" refers to a linear sequence of nucleotides. The nucleotides 

may be ribonucleotides, or deoxyribonucleotides, or a mixture of both. Examples of 
polynucleotides in the context of the present invention include single and double 
stranded DNA, single and double stranded RNA, and hybrid molecules having 
mixtures of single and double stranded DNA and RNA. The polynucleotides of the 
10 present invention may contain one or more modified nucleotides. 

"Protein," "peptide," and "polypeptide" are used interchangeably to denote an 
amino acid polymer or a set of two or more interacting or bound amino acid 
polymers. 

"Purify," or "purified" refers to a target protein that is free from at least 5- 

15 10% of contaminating proteins. Purification of a protein from contaminating 
proteins can be accomplished using known techniques, including ammonium sulfate 
or ethanol precipitation, acid precipitation, heat precipitation, anion or cation exchange 
chromatography, phosphocellulose chromatography, hydrophobic interaction 
chromatography, affinity chromatography, hydroxylapatite chromatography, size- 

20 exclusion chromatography, and lectin chromatography. Various protein purification 
techniques are illustrated in Current Protocols in Molecular Biology, Ausubel et al., 
eds. (Wiley & Sons, New York, 1988, and quarterly updates). 

"Selectable marker" refers to a marker that identifies a cell as having 
undergone a recombinant DNA or RNA event. Selectable markers include, for 

25 example, genes that encode antimetabolite resistance such as the DHFR protein that 
confers resistance to methotrexate (Wigler et al, 1980, Proc Natl Acad Set USA 
77:3567; O'Hare et al., 1981, Proc Natl Acad Sci USA, 78:1527), the GPT protein 
that confers resistance to mycophenolic acid (Mulligan & Berg, 1981, PNAS USA, 
78:2072), the neomycin resistance marker that confers resistance to the 

30 aminoglycoside G-418 (Calberre-Garapin et al., 1981, J Mo I Biol, 150:1), the Hygro 
protein that confers resistance to hygromycin (Santerre et al., 1984, Gene 30:147), 
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and the Zeocin™ resistance marker (Invitrogen). In addition, the herpes simplex 
virus thymidine kinase, hypoxanthine-guanine phosphoribosyltransferase and 
adenine phosphoribosyltransferase genes can be employed in tk", hgprf and aprt" 
cells, respectively. 

5 "Stringency" refers to the conditions (temperature, ionic strength, solvents, 

etc) under which hybridization between polynucleotides occurs. A hybridzation 
reaction conducted under high stringency conditions is one that will only occur 
between polynucleotide molecules that have a high degree of complementary base 
pairing (85% to 100% identity). Conditions for high stringency hybridization, for 

10 example, may include an overnight incubation at about 42°C for about 2.5 hours in 6 X 
SSC/0.1% SDS, followed by washing of the filters in 1.0 X SSC at 65°C, 0.1% SDS. 
A hybridization reaction conducted under moderate stringency conditions is one that 
will occur between polynucleotide molecules that have an intermediate degree of 
complementary base pairing (50% to 84% identity). 

15 "Substrate targeting moiety" refers to any signal on a substrate, either 

naturally occurring or genetically engineered, used to target any GuxA polypeptide 
or fragment thereof to a substrate. Such targeting moieties include ligands that bind 
to a substrate structure. Examples of ligand/receptor pairs include cellulose binding 
domains and cellulose. Many such substrate-specific ligands are known and are 

20 useful in the present invention to target a GuxA polypeptide or fragment thereof to a 
substrate. A novel example is a GuxA cellulose binding domain that is used to 
tether other molecules to a cellulose-containing substrate such as a fabric. 

"Thermal tolerant" refers to the property of withstanding partial or complete 
inactivation by heat and can also be described as thermal resistance or thermal 

25 stability. Although some variation exists in the literature, the following definitions 
can be considered typical for the optimum temperature range of stability and activity 
for enzymes: psycrophilic (below freezing to 10C); mesophilic (10°C to 50°C); 
thermophilic (50°C to 75°C); and caldophilic (75°C to above boiling water 
temperature). The stability and catalytic activity of enzymes are linked 

30 characteristics, and the ways of measuring these properties vary considerably. For 
industrial enzymes, stability and activity are best measured under use conditions, 
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often in the presence of substrate. Therefore, cellulases that must act on process 
streams of cellulose must be able to withstand exposure up to thermophilic or even 
caldophilic temperatures for digestion times in excess of several hours. 

In encompassing a wide variety of potential applications for embodiments of 
5 the present invention, thermal tolerance refers to the ability to function in a 
temperature range of from about 15°C to about 100°C. A preferred range is from 
about 30°C to about 80°C. A highly preferred range is from about 50°C to about 
70°C. For example, a protein that can function at about 45°C is considered in the 
preferred range even though it may be susceptible to partial or complete inactivation 

10 at temperatures in a range above about 45°C and less than about 80°C. For 
polypeptides derived from organisms such as Acidothermus, the desirable property 
of thermal tolerance among is often accompanied by other desirable characteristics 
such as: resistance to extreme pH degradation, resistance to solvent degradation, 
resistance to proteolytic degradation, resistance to detergent degradation, resistance 

15 to oxidizing agent degradation, resistance to chaotropic agent degradation, and 
resistance to general degradation. Cowan DA in Danson MJ et al. (1992) The 
Archaebacteria, Biochemistry and Biotechnology at 149-159, University Press, 
Cambridge, ISBN 1855780100. Here 'resistance' is intended to include any partial or 
complete level of residual activity. When a polypeptide is described as thermal 

20 tolerant it is understood that any one, more than one, or none of these other desirable 
properties can be present. 

"Variant", as used herein, means a polynucleotide or polypeptide molecule 
that differs from a reference molecule. Variants can include nucleotide changes that 
result in amino acid substitutions, deletions, fusions, or truncations in the resulting 

25 variant polypeptide when compared to the reference polypeptide. 

"Vector," "extra-chromosomal vector" or "expression vector" refers to a first 
polynucleotide molecule, usually double-stranded, which may have inserted into it a 
second polynucleotide molecule, for example a foreign or heterologous 
polynucleotide. The heterologous polynucleotide molecule may or may not be 

30 naturally found in the host cell, and may be, for example, one or more additional 
copy of the heterologous polynucleotide naturally present in the host genome. The 
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vector is adapted for transporting the foreign polynucleotide molecule into a suitable 
host cell. Once in the host cell, the vector may be capable of integrating into the 
host cell chromosomes. The vector may optionally contain additional elements for 
selecting cells containing the integrated polynucleotide molecule as well as elements 
5 to promote transcription of mRNA from transfected DNA. Examples of vectors 
useful in the methods of the present invention include, but are not limited to, 
plasmids, bacteriophages, cosmids, retroviruses, and artificial chromosomes. 

Within the application, unless otherwise stated, the techniques utilized may 
be found in any of several well-known references, such as: Molecular Cloning: A 

10 Laboratory Manual (Sambrook et al. (1989) Molecular cloning: A Laboratory 
Manual), Gene Expression Technology (Methods in Enzymology, Vol. 185, edited 
by D. Goeddel, 1991 Academic Press, San Diego, CA), "Guide to Protein 
Purification" in Methods in Enzymology (M.P. Deutshcer, 3d., (1990) Academic 
Press, Inc.), PGR Protocols: A Guide to Methods and Applications (Lanis et al. 

15 (1990) Academic Press, San Diego, CA), Culture of Animal Cells: A Manual of 
Basic Technique, 2 nd ed. (R.I. Freshney (1987) Liss, Inc., New York, NY), and Gene 
Transfer and Expression Protocols, pp 109-128, ed. E.J. Murray, The Humana Press 
Inc., Clifton, N.J.). 

20 O-Glycoside Hydrolases : 

Glycoside hydrolases are a large and diverse family of enzymes that 
hydrolyse the glycosidic bond between two carbohydrate moieties or between a 
carbohydrate and a non-carbohydrate moiety (See FIG. 2). Glycoside hydrolase 
enzymes are classified into glycoside hydrolase (GH) families based on significant 

25 amino acid similarities within their catalytic domains. Enzymes having related 
catalytic domains are grouped together within a family, (Henrissat et al., (1991) 
supra, and Henrissat et al. (1996), Biochem. J. 316:695-696), where the underlying 
classification provides a direct relationship between the GH domain amino acid 
sequence and how a GH domain will fold. This information ultimately provides a 

30 common mechanism for how the enzyme will hydrolyse the glycosidic bond within a 
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substrate, i.e., either by a retaining mechanism or inverting mechanism (Henrissat, 
B, (1991) supra). 

Cellulases belong to the GH family of enzymes. Cellulases are produced by 
a variety of bacteria and fungi to degrade the (3-1,4 glycosidic bond of cellulose and 
5 to so produce successively smaller fragments of cellulose and ultimately produce 
glucose. At present, cellulases are found within are at least 1 1 different GH families. 
Three different types of cellulase enzyme activities have been identified within these 
GH families: exo-acting cellulases which cleave successive disaccharide units from 
the non-reducing ends of a cellulose chain; endo-acting cellulases which randomly 
10 cleave successive disaccharide units within the cellulose chain; and p-glucosidases 
which cleave successive disaccharide units to glucose (J. W. Deacon, (1997) Modern 
Mycology, 3rd Ed., ISBN: 0-632-03077-1, 97-98). 

Many cellulases are characterized by having a multiple domain unit within 
their overall structure, a GH or catalytic domain is joined to a cellulose-binding 
15 domain (CBD) by a glycosylated linker peptide (see FIG. 1) (Koivula et al., (1996) 
Protein Expression and Purification 8:391-400). As noted above, cellulases do not 
belong to any one family of GH domains, but rather have been identified within at 
least 11 different GH families to date. The CBD type domain increases the 
concentration of the enzyme on the substrate, in this case cellulose, and the linker 
20 peptide provides flexibility for both larger domains. 

Conversion of cellulose to glucose is an essential step in the production of 
ethanol or other biofuels from biomass. Cellulases are an important component of 
this process, where approximately one kilogram of cellulase can digest fifty 
kilograms of cellulose. Within this process, thermostable cellulases have taken 
25 precedent, due to their ability to function at elevated temperatures and under other 
conditions including pH extremes, solvent presence, detergent presence, proteolysis, 
etc. (see Cowan DA (1992), supra). 

Highly thermostable cellulase enzymes are secreted by the cellulolytic 
themophile Acidothermus cellulolyticus (U.S. Patent Nos. 5,275,944 and 
30 5,1 10,735). This bacterium was originally isolated from decaying wood in an acidic, 
thermal pool at Yellowstone National Park and deposited with the American Type 
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Culture Collection (ATCC 43068) (Mohagheghi et al., (1986) Int. J. System. 
BacterioL, 36:435-443). 

Recently, a thermostable cellulase, El endoglucanase, was identified and 
characterized from Acidothennus cellulolyticus (U.S. Patent No. 5,536,655). The El 
5 endoglucanase has maximal activity between 75 and 83 °C and is active to a pH well 
below 5. Thermostable cellulase, and El endoglucanase, are useful in the 
conversion of biomass to biofoels, and in particular, are useful in the conversion of 
cellulose to glucose. Conversion of biomass to biofuel represents an extremely 
important alternative fuel source that is more environmentally friendly than 
10 conventional fuels, and provides a use, in some cases, for waste products. 

GuxA: 

As described more fully in the Examples below, GuxA, a novel thennostable 
cellulase, has now been identified and characterized. The predicted amino acid 

15 sequence of GuxA (SEQ ID NO: 1) has an organization characteristic of a cellulase 
enzyme. GuxA contains two catalytic domain-linker domain-cellulose binding 
domain units, separated from each other by a centrally located fibronectin domain. 
In particular, a first unit is located at the N-terminal end of the protein and includes a 
GH6 domain (amino acids 54-476)-linker-CBD ffl (amino acids 584-733), and a 

20 second unit, that includes a GH12 domain, is located at the C-terminal end of the 
protein (amino acids 860-1090)-linker-CBD n (amino acids 1128-1228). As 
discussed in more detail below, significant amino acid similarity of GuxA to other 
cellulases identifies GuxA as a cellulase. 

GuxA, as noted above, has two catalytic domains, identified as belonging to 

25 the GH6 and GH12 families. The GH6 domain family includes a number of 
cellobiohydrolases, for example, exocellobiohydrolase A isolated from 
Cellulomonas flmi, and exoglucanase E3 isolated from Thermobifida fusca. The 
GH6 members degrade substrate using an inverting mechanism. The GH12 domain 
family includes a number of endoglucanases, for example, endo-l,4-glucanase 

30 isolated from Streptomyces lividans, and endo-l,4-glucanase S cellulase 12A 
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isolated from Streptomyces sp. 1 1AG8. The GH12 members degrade substrate using 
a retaining mechanism. 

Being a member of the GH6 and GH12 family of proteins identifies GuxA as 
potentially having both exoglucanase and endoglucanase activity. In addition, the 

5 predicted amino acid sequence (SEQ ID NO: 2) indicates that CBD type II and CBD 
type III domains are present as characterized by Tomme P. et al. (1995), in 
Enzymatic Degradation of Insoluble Polysaccharides (Saddler JN & Penner M, eds.), 
at 142-163, American Chemical Society, Washington. See also Tomme, P. & 
Claeyssens, M. (1989) FEBS Lett. 243, 239-2431; Gilkes, N.R et al., (1988) 

10 J.Biol.Chem. 263, 10401-10407. 

GuxA is also a thermostable cellulase as it is produced by the themophile 
Acidothermus cellulolyticus . As discussed, GuxA polypeptides can have other 
desirable characteristics (see Cowan DA (1992), supra). Like other members of the 
cellulase family, and in particular thermostable cellulases, GuxA polypeptides are 

15 useful in the conversion of biomass to biofuels and biofuel additives, and in 
particular, biofuels from cellulose. It is envisioned that GuxA polypeptides could be 
used for other purposes, for example in detergents, pulp and paper processing, food 
and feed processing, and in textile processes. GuxA polypeptides can be used alone 
or in combination with one or more other cellulases or glycoside hydrolases to 

20 perform the uses described herein or known within the relevant art, all of which are 
within the scope of the present disclosure. 

GuxA Polypeptides: 

GuxA polypeptides of the invention include isolated polypeptides having an 
25 amino acid sequence as shown below in Example 1; Table 1 and in SEQ ID NO:l, as 
well as variants and derivatives, including fragments, having substantial identity to 
the amino acid sequence of SEQ ID NO:l and that retain any of the functional 
activities of GuxA. GuxA polypeptide activity can be determined, for example, by 
subjecting the variant, derivative, or fragment to a substrate binding assay or a 
30 cellulase activity assay such as those described in Irwin D et al., J. Bacteriology 
180(7): 1709-1714 (April 1998). 
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Table 1. GuxA amino acid sequence. (SEQ ID NO: 1) 

MERTQQSGRNCRYQRGTTRMPMSKRLRAGVLAGAVSIAASIW 

AGATFFVNPYWAQEVQSEAANQTNATLAAK^ 
5 YLDAALSQQQGTTPEVIEIVIYDLPGRDCAALASNGELPATAAGLQTYETQYIDPIASILSNPK 
YSSLRIVTIIEPDSLPNAVTNMSIQACATAVPYYEQGIEYALTKLHAIPNVYIYMDAAHSGWL 
GWPTSTNASGYVQEVQKVLNASIGVNGIDGFVTNTANYTPLKEPFMTATQQVGGQPVESANFY 
QWNPDIDEADYAVDLYSRLVAAGFPSSIGMLIDTLRNGWGGPNEPTGPSTATDVNTFVNQSK 
IDLRQHRGLWCNQNGAGLGQPPQASPTDFPNAHLDAYVWIKPPGESDGTSAASDPTTGKKS 

10 DPMCDPTYTTSYGVLTNALPNSPIAGQWFPAQFDQLVANARPAVPTSTSSSPPPPPPSPSASPS 
PSPSPSPSSSPSPSPSPSSSPSPSPSPSPSPSSSPSPSPSSSPSPSPSPSPSPSSSPSPSPSSSPSPSPSPSP 
SPSSSPSPSPTSSPVSGGLKVQYKNNDSAPGDNQIKPGLQLVNTGSSSVDLSTVTVRYWFTRD 
GGSSTLVYNCDWAAMGCGNIRASFGSVNPATPTADTYLQLSFTGGTLAAGGSTGEIQNRVN 
KSDWSNFTETNDYSYGTNTTFQDWTKVTVYVNGVLVWGTEPSGTSPSPTPSPSPSPSPSPGG 

15 DVTPPSWTGLWTGVSGSSVSLAWNASTDNVGVAHYNVYRNGVLVGQPTVTSFTDTGLAA 
GTAYTYTVAAVDAAGNTSAPSTPVTATTTSPSPSPTPTGTTVTDCTPGPNQNGVTSVQGDEY 
RVQTNEWNSSAQQCLTINTATGAWTVSTANFSGGTGGAPATYPSIYKGCHWGNCTTKNVG 
MPIQISQIGSAVTSWSTTQVSSGAYDVAYDIWTNSTPTTTGQPNGTEIMIWLNSRGGVQPFGS 
QTATGVTVAGHTWNVWQGQQTSWKIISYVLTPGATSISNLDLKAIFADAAARGSLNTSDYLL 

20 DVEAGFEIWQGGQGLGSNSFSVSVTSGTSSPTPSPSPTPTPSPTPTPSPSPTPSPSPTSSPSSSGV 
ACRATYVVNSDWGSGFTATWVTNTGSRATNGWTVAWSFGGNQTVTNYWNTALTQSGAS 
VTATNLSYNNVIQPGQSTTFGFNGSYSGTNAAPTLSCTAS 



25 As listed and described in Tables 1 and 5, the isolated GuxA polypeptide 

includes an N-terminal hydrophobic region that functions as a signal peptide, having 
an amino acid sequence that begins with Metl and extends to about Ala53; a first 
catalytic domain having significant sequence similarity to a GH6 family domain that 
begins with about Ala54 and extends to about Val476, a cellulose binding domain type 

30 EI region that begins with about Val584 and extends to about Glu733, a fibronectin 
type III domain that begins with about Asp756 and extends to about Val840, a second 
catalytic domain having significant sequence similarity to a GH12 family domain that 
begins with about Asp860 and extends to about Glyl090, and a cellulose binding 
domain type II that begins with about Glyl 128 and extends to about Serl228. Variants 

35 and derivatives of GuxA include, for example, GuxA polypeptides modified by 
covalent or aggregative conjugation with other chemical moieties, such as glycosyl 
groups, polyethylene glycol (PEG) groups, lipids, phosphate, acetyl groups, and the 
like. 

The amino acid sequence of GuxA polypeptides of the invention is in some 
40 embodiements about 60% identical, and in other embodiements about 70% identical, 
or in some embodiments about 90% identical, to the GuxA amino acid sequence 
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shown above in Table 1 and SEQ ID NO: 1. The percentage identity, also termed 
homology (see definition above) can be readily determined, for example, by comparing 
the two polypeptide sequences using any of the computer programs commonly 
employed for this purpose, such as the Gap program (Wisconsin Sequence Analysis 
5 Package, Version 8 for Unix, Genetics Computer Group, University Research Park, 
Madison Wisconsin), which uses the algorithm of Smith and Waterman, 1981, Adv. 
Appl Math 2: 482-489. 

Variants and derivatives of the GuxA polypeptide may further include, for 
example, fusion proteins formed of a GuxA polypeptide and a heterologous 
10 polypeptide. Preferred heterologous polypeptides include those that facilitate 
purification, oligomerization, stability, or secretion of the GuxA polypeptides. 

GuxA polypeptide fragments may include, but are not limited to, the 
polypeptide sequences listed in Table 5, SEQ ID NOS: 3, 4, 5, 6, 7 and 8. 

GuxA polypeptide variants and derivatives, as used in the description of the 
15 invention, can contain conservatively substituted amino acids, meaning that one or 
more amino acid can be replaced by an amino acid that does not alter the secondary 
and/or tertiary structure of the polypeptide. Such substitutions can include the 
replacement of an amino acid, by a residue having similar physicochemical properties, 
such as substituting one aliphatic residue (He, Val, Leu, or Ala) for another, or 
20 substitutions between basic residues Lys and Arg, acidic residues Glu and Asp, amide 
residues Gin and Asn, hydroxyl residues Ser and Tyr, or aromatic residues Phe and 
Tyr. Phenotypically silent amino acid exchanges are described more fully in Bowie et 
aL, 1990, Science 2^7:1306-1310. In addition, functional GuxA polypeptide variants 
include those having amino acid substitutions, deletions, or additions to the amino acid 
25 sequence outside functional regions of the protein, for example, outside the catalytic 
and cellulose .binding domains. These would include, for example, the various linker 
sequences that connect functional domains as defined herein. 

The GuxA polypeptides of the present invention are preferably provided in an 
isolated form, and preferably are substantially purified. The polypeptides may be 
30 recovered and purified from recombinant cell cultures by known methods, including, 
for example, ammonium sulfate or ethanol precipitation, anion or cation exchange 
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chromatography, phosphocellulose chromatography, hydrophobic interaction 
chromatography, affinity chromatography, hydroxylapatite chromatography, and lectin 
chromatography. Preferably, high performance liquid chromatography (HPLC) is 
employed for purification. 
5 Another embdiement of the invention provides for a form of GuxA polypeptide 

and polypeptides that are recombinant polypeptides expressed by suitable hosts. 
Furthermore, the hosts can simultaneously produce other cellulases such that a mixture 
is produced comprising a GuxA polypeptide and one or more other cellulases. Such a 
mixture can be effective in crude fermentation processing or other industrial 
10 processing. 

GuxA polypeptides can be fused to heterologous polypeptides to facilitate 
purification. Many available heterologous peptides (peptide tags) allow selective 
binding of the fusion protein to a binding partner. Non-limiting examples of peptide 
tags include 6-His, thioredoxin, hemaglutinin, GST, and the OmpA signal sequence 

15 tag. A binding partner that recognizes and binds to the heterologous peptide can be 
any molecule or compound, including metal ions (for example, metal affinity 
columns), antibodies, antibody fragments, or any protein or peptide that preferentially 
binds the heterologous peptide to permit purification of the fusion protein. 

GuxA polypeptides can be modified to facilitate formation of GuxA oligomers. 

20 For example, GuxA polypeptides can be fused to peptide moieties that promote 
oligomerization, such as leucine zippers and certain antibody fragment polypeptides, 
for example, Fc polypeptides. Techniques for preparing these fusion proteins are 
known, and are described, for example, in WO 99/31241 and in Cosman et.al., 2001 
Immunity 14:123-133. Fusion to an Fc polypeptide offers the additional advantage of 

25 facilitating purification by affinity chromatography over Protein A or Protein G 
columns. Fusion to a leucine-zipper (LZ), for example, a repetitive heptad repeat, 
often with four or five leucine residues interspersed with other amino acids, is 
described in Landschultz et al., 1988, Science, 240:1759. 

It is also envisioned that an expanded set of variants and derivatives of GuxA 

30 polynucleotides and/or polypeptides can be generated to select for useful molecules, 
where such expansion is achieved not only by conventional methods such as site- 
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directed mutagenesis (SDM) but also by more modern techniques, either independently 
or in combination. 

Site-directed-mutagenesis is considered an informational approach to protein 
engineering and can rely on high-resolution crystallographic structures of target 
5 proteins and some stratagem for specific amino acid changes (Van Den Burg, B.; 
Vriend, G.; Veltman, Q.R.; Venema, G.; Eijsink, V.G.H. Proc. Nat. Acad. Sci. U.S. 
1998, 95, 2056-2060). For example, modification of the amino acid sequence of 
GuxA polypeptides can be accomplished as is known in the art, such as by introducing 
mutations at particular locations by oligonucleotide-directed mutagenesis (Walder et 

10 al.,1986, Gene, 42:133; Bauer et ai., 1985, Gene 37:73; Craik, 1985, BioTechniques, 
12-19; Smith et al., 1981, Genetic Engineering: Principles and Methods, Plenum 
Press; and U.S. Patent No. 4,518,584 and U.S. Patent No. 4,737,462). SDM 
technology can also employ the recent advent of computational methods for identifying 
site-specific changes for a variety of protein engineering objectives (Hellinga, H.W. 

15 Nature Structural. Biol. 1998, 5, 525-527). 

The more modern techniques include, but are not limited to, non-informational 
mutagenesis techniques (referred to genetically as "directed evolution"). Directed 
evolution, in conjunction with high-throughput screening, allows testing of statistically 
meaningful variations in protein conformation (Arnold, F.H. Nature Biotechnol. 1998, 

20 16, 617-618). Directed evolution technology can include diversification methods 
similar to that described by Crameri A. et al. (1998, Nature 391: 288-291), site- 
saturation mutagenesis, staggered extension process (StEP) (Zhao, H.; Giver, L.; Shao, 
Z.; Affholter, J. A.; Arnold, F.H. Nature Biotechnol. 1998, 16, 258-262), and DNA 
synthesis/reassembly (U.S. Patent 5,965,408). 

25 Fragments of the GuxA polypeptide can be used, for example, to generate 

specific anti-GuxA antibodies. Using known selection techniques, specific epitopes 
can be selected and used to generate monoclonal or polyclonal antibodies. Such 
antibodies have utlilty in the assay of GuxA activity as well as in purifying 
recombinant GuxA polypeptides from genetically engineered host cells. 

30 
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GuxA Polynucleotides: 

The invention also provides polynucleotide molecules encoding the GuxA 
polypeptides discussed above. GuxA polynucleotide molecules of the invention 
include polynucleotide molecules having the nucleic acid sequence shown in Table 2 

5 and SEQ ID NO: 2, polynucleotide molecules that hybridize to the nucleic acid 
sequence of Table 2 and SEQ ID NO: 2 under high stringency hybridization 
conditions (for example, 42°, 2.5 hr. ? 6X SCC, 0.1%SDS); and polynucleotide 
molecules having substantial nucleic acid sequence identity with the nucleic acid 
sequence of Table 2 and SEQ ID NO: 2, particularly with those nucleic acids 

10 encoding the two catalytic domains, GH6 (from amino acid 54 to 476) and GH12 
(from amino acid 860 to 1090), the cellulose binding domain HI (from amino acid 
584 to 733) and cellulose binding domain II (from amino acid 1 128 to 1228). 



Table 2. GuxA nucleotide sequence. (SEQ ID NO: 2) 

1 5 ATGGAGCGAACCCAAG AATCCGG ACGGAACTGC AGGT ACCAGAGAGGAACGACACGAA 
TGCCCGCCATCTCAAAACGGCTGCGAGCCGGCGTCCTCGCCGGGGCGGTGAGCATCGCA 
GCCTCCATCGTGCCGCTGGCGATGCAGCATCCTGCCATCGCCGCGACGCACGTCGACAAT 
CCCTATGCGGGAGCGACCTTCTTCGTCAACCCGTACTGGGCGCAAGAAGTACAGAGCGA 
AGCGGCGAACCAGACCAATGCCACTCTCGCAGCGAAAATGCGCGTCGTTTCCACATATTC 

20 GACGGCCGTCTGGATGGACCGCATCGCTGCGATCAACGGCGTCAACGGCGGACCCGGCT 
TGACGACATATCTGGACGCCGCCCTCTCCCAGCAGCAGGGAACCACCCCTGAAGTCATTG 
AGATTGTCATCTACGATCTGCCGGGACGCGACTGCGCGGCGCTCGCCTCCAACGGCGAA 
CTGCCCGCTACGGCAGCAGGTTTGCAGACCTATGAAACGCAGTACATCGATCCGATTGCG 
AGTATCCTGAGCAATCCGAAGTACTCCAGCCTGCGGATCGTGACGATCATTGAGCCGGA 

25 CTCGCTGCCAAACGCGGTCACCAATATGAGCATTCAAGCGTGTGCAACGGCGGTGCCGT 
ATTACGAGCAAGGCATCGAGTACGCGCTCACGAAATTGCACGCCATTCCGAACGTGTAC 
ATCTACATGGACGCCGCCCACTCCGGCTGGCTTGGGTGGCCCAATAATGCCAGCGGATAC 
GTACAGGAAGTCCAGAAGGTCCTCAACGCGAGCATCGGGGTCAACGGCATCGACGGCTT 
CGTCACCAACACGGCGAATTACACGCCGTTGAAGGAGCCGTTCATGACCGCCACCCAGC 

30 AGGTCGGCGGACAGCCGGTGGAGTCGGCGAATTTCTACCAGTGGAATCCTGACATCGAC 
GAAGCCGACTACGCGGTTGACTTGTACTCGCGGCTCGTCGCCGCTGGCTTTCCAAGCAGC 
ATCGGCATGCTCATCGACACCTTACGCAACGGTTGGGGTGGTCCGAACGAACCAACAGG 
CCCGAGCACCGCGACCGATGTCAACACCTTCGTCAACCAGTCGAAGATTGACCTTCGGCA 
GCACCGCGGCCTGTGGTGCAACCAGAACGGTGCGGGCCTCGGCCAGCCGCCGCAGGCAA 

35 GCCCGACGGACTTCCCGAACGCGCACCTCGACGCGTATGTCTGGATCAAGCCGCCGGGT 
GAGTCGGACGGCACAAGCGCTGCGAGCGATCCGACAACTGGCAAGAAGTCGGACCCCAT 
GTGCGACCCGACGTACACGACGTCGTACGGGGTACTGACCAACGCGTTACCGAACTCCC 
CGATCGCCGGCCAGTGGTTCCCGGCGCAGTTTGACCAGCTTGTCGCGAACGCACGGCCA 
GCGGTGCCGACGTCGACCAGCTCGAGCCCGCCGCCTCCGCCGCCGAGTCCGTCGGCTTCG 

40 CCGAGTCCGAGCCCGAGTCCGAGCCCGAGCAGCTCGCCATCGCCGTCGCCGTCTCCGAGC 
TCGAGCCCGTCTCCGTCGCCGAGCCCGAGTCCGAGCCCGAGTAGCTCGCCGTCGCCGTCT 
CCGAGCTCGAGCCCGTCTCCGTCGCCGAGCCCGAGTCCGAGCCCGAGTAGCTCGCCGTCG 
CCGTCTCCGAGCTCGAGCCCGTCTCCGTCGCCGAGCCCGAGTCCGAGCCCGAGTAGCTCG 
CCGTCGCCGTCTCCGACGTCGTCGCCGGTGTCGGGTGGGCTGAAGGTGCAGTACAAGAA 
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CAATGATTCGGCGCCGGGTGATAACCAGATCAAACCGGGTCTCCAGTTGGTGAATACCG 
GGTCGTCGTCGGTGGATTTGTCGACGGTGACGGTGCGGTACTGGTTCACCCGGGATGGT 
GGGTCGTCGACACTGGTGTACAACTGTGACTGGGCGGCGATGGGGTGTGGGAATATCCG 
CGCCTCGTTCGGCTCGGTGAACCCGGCGACGCCGACGGCGGACACCTACCTGCAGTTGTC 
5 GTTCACTGGTGGAACGTTGGCCGCTGGTGGGTCGACGGGTGAGATTCAAAACCGGGTGA 
ATAAGAGTGACTGGTCGAATTTCACCGAGACCAATGACTACTCGTATGGGACGAACACC 
ACCTTCCAGGACTGGACGAAGGTGACGGTGTACGTCAACGGCGTGTTGGTGTGGGGGAC 
TGAACCGTCCGGCACCAGCCCCAGCCCCACACCATCCCCGAGCCCGAGCCCGAGCCCGA 
GCCCGGGTGGGGATGTGACGCCGCCGAGTGTGCCGACCGGCTTGGTGGTGACGGGGGTG 

10 AGTGGGTCGTCGGTGTCGTTGGCGTGGAATGCGTCGACGGATAACGTGGGGGTGGCGCA 
TTACAACGTGTACCGCAACGGGGTGTTGGTGGGCCAGCCGACGGTGACCTCGTTCACCG 
ACACGGGTTTGGCCGCGGGAACCGCGTACACCTACACGGTGGCCGCGGTGGACGCTGCG 
GGTAACACCTCCGCCCCATCCACCCCCGTCACCGCCACCACCACGAGTCCCAGCCCCAGC 
CCCACGCCGACGGGGACCACGGTCACCGACTGCACGCCCGGTCCTAACCAGAATGGTGT 

15 GACCAGCGTGCAGGGCGACGAATACCGGGTGCAGACCAATGAGTGGAATTCGTCGGCCC 
AGCAGTGCCTCACCATCAATACCGCGACCGGTGCCTGGACGGTGAGCACTGCGAACTTC 
AGCGGTGGGACCGGCGGTGCGCCCGCGACGTATCCGTCGATCTACAAGGGCTGCCACTG 
GGGCAACTGCACCACGAAGAACGTCGGGATGCCGATTCAGATCAGTCAGATTGGTTCGG 
CTGTGACGTCGTGGAGTACGACGCAGGTGTCGTCGGGCGCGTATGACGTGGCCTACGAC 

20 ATTTGGACGAACAGTACCCCAACGACAACCGGTCAGCCAAACGGTACCGAAATCATGAT 
TTGGCTGAATTCGCGTGGTGGGGTGCAGCCGTTCGGGTCGCAGACAGCGACGGGTGTGA 
CGGTCGCTGGTCACACGTGGAATGTCTGGCAGGGTCAGCAGACCTCGTGGAAGATTATT 
TCCTACGTCCTGACCCCCGGTGCGACGTCGATCAGTAATCTGGATTTGAAGGCGATTTTC 
GCGGACGCCGCGGCACGCGGGTCGCTCAACACCTCCGATTACCTGCTCGACGTTGAGGC 

25 CGGGTTTGAGATCTGGCAAGGTGGTCAGGGCCTGGGCAGCAACTCGTTCAGCGTCTCCG 
TGACGAGCGGCACGTCCAGCCCGACACCGAGCCCGAGCCCGACGCCGACACCGAGCCCG 
ACGCCGACACCGTCTCCGAGCCCGACCCCGTCGCCGAGTCCGACCAGCTCGCCGTCGTCG 
TCGGGTGTGGCGTGCCGGGCGACGTATGTGGTGAATAGTGATTGGGGTTCTGGGTTTAC 
GGCGACGGTGACGGTGACGAATACCGGGAGCCGGGCGACGAACGGGTGGACGGTGGCG 

30 TGGTCGTTTGGTGGGAATCAGACGGTCACGAACTACTGGAACACTGCGTTGACCCAATC 
AGGTGCATCGGTGACGGCGACGAACCTGAGTTACAACAACGTGATCCAACCGGGTCAGT 
CGACCACCTTCGGATTCAACGGAAGTTACTCAGGAACAAACGCCGCGCCGACGCTCAGC 
TGCACAGCCAGCTGA 



35 The GuxA polynucleotide molecules of the invention are preferably isolated 

molecules encoding the GuxA polypetide having an amino acid sequence as shown in 
Table 1 and SEQ ID NO: 1, as well as derivatives, variants, and useful fragments of 
the GuxA polynucleotide. The GuxA polynucleotide sequence can include deletions, 
substitutions, or additions to the nucleic acid sequence of Table 2 and SEQ ID NO: 2. 

40 The GuxA polynucleotide molecule of the invention can be cDNA, chemically 

synthesized DNA, DNA amplified by PCR, RNA, or combinations thereof. Due to 
the degeneracy of the genetic code, two DNA sequences may differ and yet encode 
identical amino acid sequences. The present invention thus provides an isolated 
polynucleotide molecule having a GuxA nucleic acid sequence encoding GuxA 

45 polypeptide, where the nucleic acid sequenc encodes a polypeptide having the 
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complete amino acid sequences as shown in Table 1 and SEQ ID NO: 1, or variants, 
derivatives, and fragments thereof. 

The GuxA polynucleotides of the invention have a nucleic acid sequence that 
is in some embodiements about 60% identical to the nucleic acid sequence shown in 
5 Table 2 and SEQ ID NO: 2, in some embodiments about 70% identical to the nucleic 
acid sequence shown in Table 2 and SEQ ID NO: 2, and in other embodiments about 
90% identical to the nucleic acid sequence shown in Table 2 and SEQ ID NO: 2. 
Nucleic acid sequence identity is determined by known methods, for example by 
aligning two sequences in a software program such as the BLAST program (Altschul, 
10 S.F et al. (1990) J. Mol. Biol. 215:403-410, from the National Center for 
Biotechnology Information (>ttp://www.ncbi.nlm.nih.gov/BLAST/). 

The GuxA polynucleotide molecules of the invention also include isolated 
polynucleotide molecules having a nucleic acid sequence that hybridizes under high 
stringency conditions (as defined above) to a the nucleic acid sequence shown in Table 
15 2 and SEQ ID NO: 2. Hybridization of the polynucleotide is to at least about 15 
contiguous nucleotides, or at least about 20 contiguous nucleotides, and in other 
embodiments at least about 30 contiguous nucleotides, and in still other embodiments 
at least about 1 00 contiguous nucleotides of the nucleic acid sequence shown in Table 
2 and SEQ ID NO: 2. 

20 Useful fragments of the GuxA-encoding polynucleotide molecules described 

herein, include probes and primers. Such probes and primers can be used, for 
example, in PGR methods to amplify and detect the presence of GuxA polynucleotides 
in vitro, as well as in Southern and Northern blots for analysis of GuxA. Cells 
expressing the GuxA polynucleotide molecules of the invention can also be identified 

25 by the use of such probes. Methods for the production and use of such primers and 
probes are known. For PGR, 5' and 3' primers corresponding to a region at the termini 
of the GuxA polynucleotide molecule can be employed to isolate and amplify the 
GuxA polynucleotide using conventional techniques. 

Other useful fragments of the GuxA polynucleotides include antisense or sense 
30 oligonucleotides comprising a single-stranded nucleic acid sequence capable of 
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binding to a target GuxA mRNA (using a sense strand), or DNA (using an antisense 
strand) sequence. 

Vectors and Host Cells: 

5 The present invention also provides vectors containing the polynucleotide 

molecules of the invention, as well as host cells transformed with such vectors. Any of 

the polynucleotide molecules of the invention may be contained in a vector, which 

generally includes a selectable marker and an origin of replication, for propagation in a 

host. The vectors further include suitable transcriptional or translational regulatory 

10 sequences, such as those derived from a mammalian, microbial, viral, or insect genes, 

» 

operably linked to the GuxA polynucleotide molecule. Examples of such regulatory 
sequences include transcriptional promoters, operators, or enhancers, mRNA 
ribosomal binding sites, and appropriate sequences which control transcription and 
translation. Nucleotide sequences are operably linked when the regulatory sequence 

15 functionally relates to the DNA encoding the target protein. Thus, a promoter 
nucleotide sequence is operably linked to a GuxA DNA sequence if the promoter 
nucleotide sequence directs the transcription of the GuxA sequence. 

Selection of suitable vectors for the cloning of GuxA polynucleotide molecules 
encoding the target GuxA polypeptides of this invention will depend upon the host cell 

20 in which the vector will be transformed, and, where applicable, the host cell from 
which the target polypeptide is to be expressed. Suitable host cells for expression of 
GuxA polypeptides include prokaryotes, yeast, and higher eukaryotic cells, each of 
which is discussed below. 

The GuxA polypeptides to be expressed in such host cells may also be fusion 

25 proteins that include regions from heterologous proteins. As discussed above, such 
regions may be included to allow, for example, secretion, improved stability, or 
facilitated purification of the GuxA polypeptide. For example, a nucleic acid sequence 
encoding an appropriate signal peptide can be incorporated into an expression vector. 
A nucleic acid sequence encoding a signal peptide (secretory leader) may be fused 

30 in-frame to the GuxA sequence so that GuxA is translated as a fusion protein 
comprising the signal peptide. A signal peptide that is functional in the intended host 
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cell promotes extracellular secretion of the GuxA polypeptide. Preferably, the signal 
sequence will be cleaved from the GuxA polypeptide upon secretion of GuxA from the 
cell. Non-limiting examples of signal sequences that can be used in practicing the 
invention include the yeast I-factor and the honeybee melatin leader in Sf9 insect cells. 
5 Suitable host cells for expression of target polypeptides of the invention 

include prokaryotes, yeast, and higher eukaryotic cells. Suitable prokaryotic hosts to 
be used for the expression of these polypeptides include bacteria of the genera 
Escherichia, Bacillus, and Salmonella, as well as members of the genera 
Pseudomonas, Streptomyces, and Staphylococcus, For expression in prokaryotic cells, 

10 for example, in E. coli 9 the polynucleotide molecule encoding GuxA polypeptide 
preferably includes an N-terminal methionine residue to facilitate expression of the 
recombinant polypeptide. The N-terminal Met may optionally be cleaved from the 
expressed polypeptide. 

Expression vectors for use in prokaryotic hosts generally comprise one or more 

15 phenotypic selectable marker genes. Such genes encode, for example, a protein that 
confers antibiotic resistance or that supplies an auxotrophic requirement. A wide 
variety of such vectors are readily available from commercial sources. Examples 
include pSPORT vectors, pGEM vectors (Promega, Madison, WI), pPROEX vectors 
(LIT, Bethesda, MD), Bluescript vectors (Stratagene), and pQE vectors (Qiagen). 

20 GuxA can also be expressed in yeast host cells from genera including 

Saccharomyces, Pichia, and Kluveromyces. Preferred yeast hosts are S. cerevisiae and 
P. pastoris. Yeast vectors will often contain an origin of replication sequence from a 
2T yeast plasmid, an autonomously replicating sequence (ARS), a promoter region, 
sequences for polyadenylation, sequences for transcription termination, and a 

25 selectable marker gene. Vectors replicable in both yeast and E. coli (termed shuttle 
vectors) may also be used. In addition to the above-mentioned features of yeast 
vectors, a shuttle vector will also include sequences for replication and selection in E. 
coli. Direct secretion of the target polypeptides expressed in yeast hosts may be 
accomplished by the inclusion of nucleotide sequence encoding the yeast I-factor 

30 leader sequence at the 5' end of the GuxA-encoding nucleotide sequence. 
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Insect host cell culture systems can also be used for the expression of GuxA 
polypeptides. The target polypeptides of the invention are preferably expressed using a 
baculovirus expression system, as described, for example, in the review by Luckow 
and Summers, 1988 Bio/Technology 6:47. 
5 The choice of a suitable expression vector for expression of GuxA 

polypeptides of the invention will depend upon the host cell to be used. Examples of 
suitable expression vectors for E. coli include pET, pUC, and similar vectors as is 
known in the art. Preferred vectors for expression of the GuxA polypeptides include 
the shuttle plasmid pU702 for Streptomyces lividans, pGAPZalpha-A, B, C and 

10 pPICZalpha-A, B, C (Invitrogen) for Pichia pastoris, and pFE-1 and pFE-2 for 
filamentous fungi and similar vectors as is known in the art. 

Modification of a GuxA polynucleotide molecule to facilitate insertion into a 
particular vector (for example, by modifiying restriction sites), ease of use in a 
particular expression system or host (for example, using preferred host codons), and 

15 the like, are known and are contemplated for use in the invention. Genetic 
engineering methods for the production of GuxA polypeptides include the 
expression of the polynucleotide molecules in cell free expression systems, in 
cellular hosts, in tissues, and in animal models, according to known methods. 

20 Compositions 

The invention provides compositions containing a substantially purified 
GuxA polypeptide of the invention and an acceptable carrier. Such compositions are 
administered to biomass, for example, to degrade the cellulose in the biomass into 
simpler carbohydrate units and ultimately, to sugars. These released sugars from the 
25 cellulose are converted into ethanol by any number of different catalysts. Such 
compositions may also be included in detergents for removal, for example, of 
cellulose containing stains within fabrics, or compositions used in the pulp and paper 
industry, to address conditions associated with cellulose content. Compositions of 

■ 

the present invention can be used in stonewashing jeans such as is well known in the 
30 art. Compositions can be used in the biopolishing of cellulosic fabrics, such as 
cotton, linen, rayon and Lyocell. 
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The invention provides pharmaceutical compositions containing a 
substantially purified GuxA polypeptide of the invention and if necessary a 
pharmaceutically acceptable carrier. Such pharmaceutical compositions are 
administered to cells, tissues, or patients, for example, to aid in delivery or targeting 
5 of other pharmaceutical compositions. For example, GuxA polypeptides may be 
used where carbohydrate-mediated liposomal interactions are involved with target 
cells. Vyas SP et al. (2001), J. Pharmacy & Pharmaceutical Sciences May-Aug 
4(2): 138-58. 

The invention also provides reagents, compositions, and methods that are 
10 useful for analysis of GuxA activity and for the analysis of cellulose breakdown. 

Compositions of the present invention may also include other known 
cellulases, and preferably, other known thermal tolerant cellulases for enhanced 
treatment of cellulose. 

15 Antibodies 

The polypeptides of the present invention, in whole or in part, may be used to 
raise polyclonal and monoclonal antibodies that are useful in purifying GuxA, or 
detecting GuxA polypeptide expression, as well as a reagent tool for characterizing the 
molecular actions of the GuxA polypeptide. Preferably, a peptide containing a unique 

20 epitope of the GuxA polypeptide is used in preparation of antibodies, using 
conventional techniques. Methods for the selection of peptide epitopes and production 
of antibodies are known. See, for example, Antibodies: A Laboratory Manual, Harlow 
and Land (eds.), 1988 Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
N.Y.; Monoclonal Antibodies \ Hybridomas: A New Dimension in Biological 

25 Analyses, Kennet et al. (eds.), 1980 Plenum Press, New York. 

Assays 

Agents that modify, for example, increase or decrease, GuxA hydrolysis or 
degradation of cellulose can be identified, for example, by assay of GuxA cellulase 
30 activity and/or analysis of GuxA binding to a cellulose substrate. Incubation of 
cellulose in the presence of GuxA and in the presence or absence of a test agent and 
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correlation of cellulase activity or cellulose binding permits screening of such 
agents. For example, cellulase activity and binding assays may be performed in a 
manner similar to those described in Irwin et al., J. Bacteriology 180(7): 1709-1714 
(April 1998). 

5 The GuxA stimulated activity is determined in the presence and absence of a 

test agent and then compared. A lower GuxA activated test activity in the presence 
of the test agent, than in the absence of the test agent, indicates that the test agent has 
decreased the activity of the GuxA. A higher GuxA activated test activity in the 
presence of the test agent than in the absence of the test agent indicates that the test 
10 agent has increased the activity of the GuxA. Stimulators and inhibitors of GuxA 
may be used to augment, inhibit, or modify GuxA mediated activity, and therefore 
may have potential industrial uses as well as potential use in the further elucidation 
of GuxA's molecular actions. 

15 Therapeutic Applications : 

The GuxA polypeptides of the invention are effective in adding in delivery or 
targeting of other pharmaceutical compositions within a host. For example, GuxA 
polypeptides may be used where carbohydrate-mediated liposomal interactions are 
involved with target cells. Vyas SP et al. (2001), J, Pharm Pharm Sci May- Aug 
20 4(2): 138-58. 

GuxA polynucleotides and polypeptides, including vectors expressing GuxA, 
of the invention can be formulated as pharmaceutical compositions and administered 
to a host, preferably mammalian host, including a human patient, in a variety of 
forms adapted to the chosen route of administration. The compounds are preferably 

25 administered in combination with a pharmaceutically acceptable carrier, and may be 
combined with or conjugated to specific delivery agents, including targeting 
antibodies and/or cytokines. 

GuxA can be administered by known techniques, such as orally, parentally 
(including subcutaneous injection, intravenous, intramuscular, intrasternal or 

30 infusion techniques), by inhalation spray, topically, by absorption through a mucous 
membrane, or rectally, in dosage unit formulations containing conventional non- 
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toxic pharmaceutical^ acceptable carriers, adjuvants or vehicles. Pharmaceutical 
compositions of the invention can be in the form of suspensions or tablets suitable 
for oral administration, nasal sprays, creams, sterile injectable preparations, such as 
sterile injectable aqueous or oleagenous suspensions or suppositories. 
5 For oral administration as a suspension, the compositions can be prepared 

according to techniques well-known in the art of pharmaceutical formulation. The 
compositions can contain microcrystalline cellulose for imparting bulk, alginic acid 
or sodium alginate as a suspending agent, methylcellulose as a viscosity enhancer, 
and sweeteners or flavoring agents. As immediate release tablets, the compositions 
10 can contain microcrystalline cellulose, starch, magnesium stearate and lactose or 
other excipients, binders, extenders, disintegrants, diluents and lubricants known in 
the art. 

For administration by inhalation or aerosol, the compositions can be prepared 
according to techniques well-known in the art of pharmaceutical formulation. The 
15 compositions can be prepared as solutions in saline, using benzyl alcohol or other 
suitable preservatives, absorption promoters to enhance bioavailability, 
fluorocarbons or other solubilizing or dispersing agents known in the art. 

For administration as injectable solutions or suspensions, the compositions 
can be formulated according to techniques well-known in the art, using suitable 
20 dispersing or wetting and suspending agents, such as sterile oils, including synthetic 
mono- or diglycerides, and fatty acids, including oleic acid. 

For rectal administration as suppositories, the compositions can be prepared 
by mixing with a suitable non-irritating excipient, such as cocoa butter, synthetic 
glyceride esters or polyethylene glycols, which are solid at ambient temperatures, but 
25 liquefy or dissolve in the rectal cavity to release the drug. 

Preferred administration routes include orally, parenterally, as well as 
intravenous, intramuscular or subcutaneous routes. More preferably, the compounds 
of the present invention are administered parenterally, i.e., intravenously or 
intraperitoneally, by infusion or injection. 
30 Solutions or suspensions of the compounds can be prepared in water, isotonic 

saline (PBS) and optionally mixed with a nontoxic surfactant. Dispersions may also 



i 
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be prepared in glycerol, liquid polyethylene, glycols, DNA, vegetable oils, triaeetin 
and mixtures thereof. Under ordinary conditions of storage and use, these . 
preparations may contain a preservative to prevent the growth of microorganisms. 

The pharmaceutical dosage form suitable for injection or infusion use can 
5 include sterile, aqueous solutions or dispersions or sterile powders comprising an 
active ingredient which are adapted for the extemporaneous preparation of sterile 
injectable or infusible solutions or dispersions. In all cases, the ultimate dosage form 
should be sterile, fluid and stable under the conditions of manufacture and storage. 
The liquid carrier or vehicle can be a solvent or liquid dispersion medium 

10 comprising, for example, water, ethanol, a polyol such as glycerol, propylene glycol, 
or liquid polyethylene glycols and the like, vegetable oils, nontoxic glyceryl esters, 
and suitable mixtures thereof. The proper fluidity can be maintained, for example, 
by the formation of liposomes, by the maintenance of the required particle size, in 
the case of dispersion, or by the use of nontoxic surfactants. The prevention of the 

15 action of microorganisms can be accomplished by various antibacterial and 
antifungal agents, for example, parabens, chlorobutanol, phenol, sorbic acid, 
thimerosal, and the like. In many cases, it will be desirable to include isotonic 
agents, for example, sugars, buffers, or sodium chloride. Prolonged absorption of 
the injectable compositions can be brought about by the inclusion in the composition 

20 of agents delaying absorption— for example, aluminum monosterate hydrogels and 
gelatin. 

Sterile injectable solutions are prepared by incorporating the compounds in 
the required amount in the appropriate solvent with various other ingredients as 
enumerated above and, as required, followed by filter sterilization. In the case of 
25 sterile powders for the preparation of sterile injectable solutions, the preferred 
methods of preparation are vacuum drying and freeze-drying techniques, which yield 
a powder of the active ingredient plus any additional desired ingredient present in 
the previously sterile-filtered solutions. 
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Industrial Applications 

The GuxA polypeptides of the invention are effective cellulases. Ih the 
methods of the invention, the cellulose degrading effects of GuxA are achieved by 
treating biomass at a ratio of about 1 to about 50 of GuxA:biomass. GuxA may be 

5 used under extreme conditions, for example, elevated temperatures and acidic pH. 
Treated biomass is degraded into simpler forms of carbohydrates, and in some cases 
glucose, which is then used in the formation of ethanol or other industrial chemicals, 
as is known in the art. Other methods are envisioned to be within the scope of the 
present invention, including methods for treating fabrics to remove cellulose- 

10 containing stains and other methods already discussed. GuxA polypeptides can be 
used in any known application currently utilizing a cellulase, all of which are within 
the scope of the present invention. 

Having generally described the invention, the same will be more readily 
understood by reference to the following examples, which are provided by way of 

15 illustration and are not intended as limiting. 

EXAMPLES 
Example 1 : Molecular Cloning of GuxA 

Genomic DNA was isolated from Acidothermus cellulolyticus and purified 
20 by banding on cesium chloride gradients. Genomic DNA was partially digested with 
Sau 3 A and separated on agarose gels. DNA fragments in the range of 9-20 kilobase 
pairs were isolated from the gels. This purified Sau 3A digested genomic DNA was 
ligated into the Bam HI acceptor site of purified EMBL3 lambda phage arms 
(Clontech, San Diego, Calif.). Phage DNA was packaged according to the 
25 manufacturer's specifications and plated with E. coli LE392 in top agar which 
contained the soluble cellulose analog, carboxymethylcellulose (CMC). The plates 
were incubated overnight (12-24 hours) to allow transfection, bacterial growth, and 
plaque formation. Plates were stained with Congo Red followed by destaining with 1 
M NaCL Lambda plaques harboring endoglucanase clones showed up as unstained 
30 plaques on a red background. 
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Lambda clones which screened positive on CMC-Congo Red plates were 
purified by successive rounds of picking, plating and screening. Individual phage 
isolates were named SL-1, SL-2, SL-3 and SL-4. Subsequent subcloning efforts 
employed the SL-3 clone which contained an approximately 14.2 kb fragment of A. 
5 cellulolyticus genomic DNA. 

Template DNA was constructed using a 9 kb BamHl fragment obtained from 
the 14.2 kb lambda clone SL3 prepared from Acidothermus cellulolyticus genomic 
DNA. The 9-kb BamHI fragment from SL3 was subcloned into pDR540 to generate 
a plasmid NREL501. NREL501 was first sequenced by the primer walking method 

10 as is known in the art. NREL501 was then subcloned into pUC19 using restriction 
enzymes PstI and EcoRI and transformed into E. coli XL 1 -blue (Stratagene, La Jolla, 
California) for the production of template DNA for sequencing. Each subclone was 
sequenced from both forward and reverse directions. DNA for sequencing was 
prepared from an overnight growth in 500 mL LB broth using a megaprep DNA 

15 purification kit from Promega. The template DNA was PEG precipitated and 
suspended in de-ionized water and adjusted to a final concentration of 0.25 mg/mL. 
Custom primers were designed by reading upstream known sequence and selecting 
segments of an appropriate length to function, as is well known in the art. Primers 
for cycle sequencing were synthesized at the Macromolecular Resources facility 

20 located at Colorado State University in Fort Collins, Colorado. Typically the 
sequencing primers were 26-30 nucleotides in length, but were sometimes longer or 
shorter to accommodate a melting temperature appropriate for cycle sequencing. 
The sequencing primers were diluted in de-ionized water, the concentration 
measured using UV absorbance at 260 rnn, and then adjusted to a final concentration 

25 of 5 pmol/juL. Templates and sequencing primers were shipped to the Iowa State 
University DNA Sequencing facility at Ames, Iowa for sequencing using standard 
chemistries for cycle sequencing. In many cases, regions of the template that 
sequenced poorly using the standard protocols and dye terminators were repeated 
with the addition of 2 jjL DMSO and by using nucleotides optimized for the 

30 sequencing of high GC content DNA. The high frequency of reoccurring small 
domains (ie, CBDs and linkers) with high sequence similarity caused initial 



WO 03/012109 



PCT/US01/23817 



-35- 

difficulties in sequence assignments which were only resolved through extensive 
review of the data and repeat analyses. 

Sequencing data from primer walking and subclones were assembled 
together to verify that all SL3 regions had been sequenced from both strands. An 
5 open reading frames (ORP) was found in the 9-kb BamHI fragment, C -terminal of 
El (patent 5,536,655), termed GuxA. 

An ORF of about 3687 bp [SEQ ID NO: 2], including a stop codon, and 
deduced amino acid sequence [SEQ ID NO:l] are shown in Tables 1 and 2. The 
amino acid sequence predicted by SEQ ID NO: 1 was determined to have significant 

10 homology to known cellulases, as shown below in Example 2 and in Tables 3 and 4. 
The amino acid sequence represents a novel member of the family of proteins with 
cellulase activity. Due to the source of isolation from the thermophilic organism 
Acidothermus, GuxA is a novel member of cellulases with properties including 
thermal tolerance. It is also known that thermal tolerant enzymes may have other 

15 properties (see definition above). 

Example 2: GuxA includes a GH6 catalytic domain 

Sequence alignments and comparisons of the amino acid sequences of the 
Acidothermus cellulolyticus GuxA first catalytic domain (aa 54 to 476), 

20 Cellulomonas flmi CBHA (beta-(l,4) exocellobiohydrolase) and Thermobi/ida fusca 
E3 (beta-(l,4) exocellulase) polypeptides were prepared, using the ClustalW 
program (Thompson J.D et al. (1994), Nucleic Acids Res. 22:4673-4680 from 
EMBL European Bioinformatics Institute website (http://www.ebi.ac.uk/). 

An examination of the amino acid sequence alignment of the GH6 domains 

25 indicates that the amino acid sequence of the GuxA catalytic domain is homologous 
to the amino acid sequences of known GH6 family catalytic domains for C. fimi 
CBHA and T. fusca E3 (See Table 3). In Table 3, the notations are as follows: an 
asterisk indicates identical or conserved residues in all sequences in the 
alignment; a colon ":" indicates conserved substitutions; a period V indicates semi- 

30 conserved substitutions; and a hyphen "-" indicates a gap in the sequence. The 
amino acid sequence predicted for the GuxA GH6 domain is approximately 55% 
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identical to the C. firni CBHA GH6 domain and approximately 48% identical to the 
T. fusca E3 GH6 domain, indicating that the GuxA first catalytic domain is a 
member of the GH6 family (Henrissat et al. (1991), supra). 



10 



Table 3. Multiple amino acid sequence alignment of a GuxA first catalytic 
domain and polypeptides with Glycoside Hydrolase Family 6 catalytic domains. 

Multialignment of related Glycoside Hydrolase Family 6 catalytic domain 
GH6_Ace: Acidothermus cellulolyticus GuxA catalytic domain GH6 

CBHA_Cfi: Cellulomonas fimi CBHA (beta-l,4-exocellobiohydrolase). GeneBank Acc. # AAC36898 
E3__Tfu: Thermobifida fusca E3 (beta-l,4-exocellu!ase). GeneBank Acc. #U 18978 



GH6_Ace 
15 CBHA_Cfi 
E3 Tfu 



"ATHVDNPYAGATFFTOPYWAQEVQSEAANQTN-ATLAAKMRWSTYSTAVWMDRIAAIN 
APVHVDNPYAGAVQYVNPTWAASVNAAAGRQSADPALiAAKMRTVAGQPTAVWMDRI SAI T 
PGGPTNPPTNPGEKVDNPFEGAKLYVNPVW- S AKAAAE PGGSAVANE S TAVWLDR I GAI E 



** 



******** ** 



GH6_Ace 
20 CBHA_Cfi 
E3 Tfu 



GVN GGPGLTTYLDAALSQQQGT - TPEVI E I VI YDLPGRDCAALASNGELPATAAGL 

GNA DGNGLKFHIiDNAVAQQKAAGVPLVFNLVIYDLPGRDCFALASNGEIjPATDAGL 

GNDSPTTGSMGLRDHIjEEAVRQSGGD - - PLTIQWI YNLPGRDCAAIiASNGELGPDE- -L 



** 



* . * 



***.****** ******** 



GH6_Ace 
25 CBHA_Cfi 
E3 Tfu 



QTYETQYI DPIASI LSN- PKYSSLRI VTI I EPDSLPNAVTNMSI QACATAVPYYEQ 

ARYKSEYIDPI ADLLDN- PEYES I RIAATIEPDSLPNLTTNI SEPACQQAAPYYRQ 

DRYKSEYI DPI AD IMWDFAD YENLRI VAI I E I DSLPNLVTNVGGNGGTELiCAYMKQNGGY 



******* ... * .** * ** ***** ** . 



GH6_Ace 
30 CBHA_Cfi 
E3 Tfu 



- -GIEYALTKLHAI PNVY I YMDAAHSGWLGWPNNAS GYVQE VQKVLN - AS I GVNG I DGF V 

- - GVKYALD KLHAI PNVYNY I D I GHS GWLGWD SNAGPS ATIjFAE VAKSTTAGFAS I DGFV 
VNGVGYALRKLGEI PNVYNYIDAAHHGWIGWDSNFGPSVDI FYEAANASGSTVDYVHGFI 



*. *** ** ***** *.* * ***** 



** 



GH6_Ace 
35 CBHA_Cfi 
E3 Tfu 



TNTANYTPLKEPFMT - ATQQVGGQP VE S ANF YQWNPD I DEAD YAVDL YSRLVAAGF P S S I 
SDVANTTPDEE PLLSDS SLTINNTP I RS SKFYEWNFDFDE I D YTAHMHRLL VAAGF P S S I 
SNTANYSATVE PYLD - VNGTVNGQL I RQSKWVDWNQYVDELSFVQDLRQAL IAKGFRSD I 



** 



** 



** 



** 



* . * * * * * 



GH6_Ace 
40 CBHA_Cfi 
E3 Tfu 



GML I DTLRNGWGGPNEPTGPSTATDVNTFVNQSKI DLRQHRGLWCNQNGAGLGQPPQASP 
GMLVDTSRNGWGGPNRPTSITASTDVNAYVDANRVDRRVHRGAWCNPLGAGIGRFPEATP 
GML IDTSRNGWGGPNRPTGPSSSTDLNTYVDESRI DRRI HPGNWCNQAGAGLGERPTVNP 

***.** ******** ** ...**.*..*- ..* * * * *** ***;* * * 



GH6_Ace 
45 CBHA_Cf i 
E3 Tfu 



TDFPNAHLDAYVWIKPPGESDGTSAASDPTTGKKSDPMCDPTYTTS - - YGVLTN-ALPNS 
SGYAASHLDAFVWI KPPGESDGASTDI PNDQGKRFDRMCDPTFVSPKLNNQLTG-ATPNA 

-APGVDAYVWVKPPGESDGASEEIPNDEGKGFDRMCDPTYQGNARNGNNPSGALPNA 

.**.**.********.* ** ******* ..***: 



GH6_Ace 
50 CBHA_Cfi 
E3 Tfu 



P I AGQWFPAQ FDQIiVANARPAV 
PLAGQWFEEQFVTLVKNAYPVI 
PI SGHWFSAQFRELIiANAYPPL 
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Example 3: GuxA includes a GH12 catalytic domain 

Sequence alignments and comparisons of the amino acid sequences of the 
Acidothermus cellulolyticus GuxA second catalytic domain (aa 860 to 1090), 
Streptomyces sp. 11AG8 cellulase 12A (endoglucanase) and Streptomyces lividans 

5 cellulase B (endoglucanase) polypeptides were prepared, using the ClustalW 
program (EMBL; supra). An examination of the amino acid sequence alignment of 
the GH12 domains indicates that the amino acid sequence of the GuxA second 
catalytic domain is homologous to the amino acid sequences of known GH12 family 
catalytic domains for Streptomyces sp. cellulase 12A and S. lividans cellulase B (See 

10 Table 4). The amino acid sequence predicted for the GuxA GH6 domain is 
approximately 45% identical to the Streptomyces sp. cellulase 12A GH12 domain 
and approximately 42% identical to the S. lividans cellulase B GH12 domain, 
indicating that the GuxA second catalytic domain is a member of the GH12 family 
(Henrissat et al. (1991), supra). 

15 

Table 4. Multiple amino acid sequence alignment of a GuxA second catalytic 
domain and polypeptides with Glycoside Hydrolase Family 12 catalytic 
domains. 

20 Multialignment of related Glycoside Hydrolase Family 12 catalytic domain 

GH12_Ace: Acidothermus cellulolyticus GuxA Hydrolase Family 12 catalytic domain 

Cell2A_Ssp: Streptomyces sp. 11AG8 cellulase 12A(endoglucanase). GeneBank Acc. # AAF91283. 

CelB_Sli: .Streptomyces lividans cellulase B (endoglucanase ) . GeneBank Acc. # AAB7 1950 

25 Cell2A__SSp NQQ I CDRYGTTT I QD - RYVVQNNRWGTSATQCI NV- TGNG- FE ITQADGS VPTN 

CelB_SLi DTT I CEPFGTTT I QG - RYWQNNRWGSTAPQCVTA- TDTG - FRVTQADGSAPTN 

GH 1 2_ACe CTPGPNQNGVTS VQGDEYRVQTNE WNS SAQQCLT INTATGAWT VSTANFSGGTG 

*.*::*. .* **.*.*.;:* **:. * . * : :: *: * * . 

30 Cell2A_SSp GAPKSYPSVYDGCHYGNCAPR-TTLPMRI SSIGSAPSSVSYRYTGNGVYNAAYDI WLDPT 

CelB_SLi GAPKSYPSVFNGCHYTNCSPG -TDLPVRLDTVSAAPSS I S YGFVDGAVYNAS YD I WLDPT 

GH1 2_ACe GAPATYPS I YKGCHWGNCTTKNVGMP I Q I SQ IGSAVTSWSTTQVS SGAYDVAYD I WTNST 

*** .***...***. ** : . . :*:::. :.:* :* * *...**** : .* 

35 Cel 1 2 A_SSp PRTNG - VNRTE I M I WFNRVGPVQP I GS PVGT - AHVGGRS WEVWTGSNGSNDVI S FLAPSA 

Ce 1 B_SLi ARTDG - VNQTE I MI WFNRVGP IQPIGS PVGT - AS VGGRTWEVWSGGNGSNDVLS FVAPSA 

GH12_ACe PTTTGQPNGTEIMIWLNSRGGVQPFGSQTATGVTVAGHTWNVWQGQQTSWKI ISYVLTPG 

. * * * ******.* * .**.** . *.*..*.** * . * .::*:: ... 

40 Ce 1 1 2 A_S Sp I SSWS - FDVKDF VDQAVSHGLATPDWYLTS I QAGFEPWEGGTGLAVNS FS SAVN 

Ce lB_SLi I SGWS - FDVMDFVRATVARGLAENDWYLTS VQAGFEPWQNGAGLAVNS FS STVE 

GH12 ACe ATS I SNLDLKAI FADAAARGSLNTSDYLLDVEAGFE I WQGGQGLGSNSFSVSVT 
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Example 4: Mixed Domain GH6, GH12, CBD II, CBD IH Genes and Hybrid 
Polypeptides 

From the putative locations of the domains in the GuxA cellulase sequence 
given above and in comparable cloned cellulase sequences from other species, one 

5 can separate individual domains and combine them with one or more domains from 
different sequences. The significant similarity between cellulase genes permit one 
by recombinant techniques to arrange one or more domains from the Acidothermus 
cellulolyticus GuxA cellulase gene with one or more domains from a cellulase gene 
from one or more other microorganisms. Other representative endoglucanase genes 

10 include Bacillus polymyxa beta-(l,4) endoglucanase (Baird et al, Journal of 
Bacteriology, 172: 1576-86 (1992)) and Xanthomonas campestris beta-(l,4)- 
endoglucanase A (Gough et al, Gene 89:53-59 (1990)). The result of the fusion of 
any two or more domains will, upon expression, be a hybrid polypeptide. Such 
hybrid polypeptides can have one or more catalytic or binding domains. For ease of 

15 manipulation, recombinant techniques may be employed such as the addition of 
restriction enzyme sites by site-specific mutagenesis. If one is not using one domain 
of a particular gene, any number of any type of change including complete deletion 
may be made in the unused domain for convenience of manipulation. 

It is understood for purposes of this disclosure, that various changes and 

20 modifications may be made to the invention that are well within the scope of the 
invention. Numerous other changes may be made which will readily suggest 
themselves to those skilled in the art and which are encompassed in the spirit of the 
invention disclosed herein and as defined in the appended claims. 

This specification contains numerous citations to references such as patents, 

25 patent applications, and publications. Each is hereby incorporated by reference for 
all purposes. 



i 
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Claims 

1 . A composition comprising a substantially purified thermostable GuxA 
peptide, said GuxA peptide comprising a first catalytic domain GH6, a second 

5 catalytic domain GH 12, a carbohydrate binding domain (CBD) type HI, and a 
carbohydrate binding domain (CBD) type II. 

2. The composition of claim 1 further defined as comprising a sequence of SEQ 
ID NO: 4, SEQ ID NO: 7, SEQ ID NO: 5 and SEQ ID NO: 8. 

10 

3. A thermal tolerant GuxA peptide having a sequence of SEQ ED NO: 1 . 

4. An industrial mixture suitable for degrading cellulose, such mixture 
comprising the GuxA polypeptide of claim 1 . 

15 

5. The industrial mixture of claim 4 further defined as comprising a detergent. 

6. An isolated polypeptide molecule comprising: 
a) a sequence of SEQ ID NO: 4; 

20 b) a sequence of SEQ ID NO: 7; 

c) a sequence of SEQ ID NO: 5; 

d) a sequence of SEQ ID NO: 8; 

e) a sequence of SEQ ID NO: 1 ; or 

f) an amino acid sequence having at least 70% sequence identity with the 
25 amino acid sequence of a), b), c), d), or e). 

7. A fusion protein comprising the polypeptide of claim 6 and a heterologous 
peptide. 



30 



8. The fusion protein of claim 7, wherein the heterologous peptide is a leucine 
zipper. 
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9. A cellulase-substrate complex comprising the isolated polypeptide molecule 
of claim 6 bound to cellulose. 

5 10. A vector comprising the polynucleotide molecule that encodes a polypeptide 
of claim 6. 

11. A host cell genetically engineered to express the polypeptide molecule of 
claim 6. 

10 

12. A composition comprising the polypeptide molecule of claim 6 and a carrier. 

13. An isolated antibody that specifically binds to the polypeptide molecule of 
claim 6. 

15 

14. A method for producing GuxA polypeptide, the method comprising: 
incubating a host cell genetically engineered to express the polypeptide 

molecule of claim 6. 

« 

20 15. A set of amplification primers for amplification of a polynucleotide molecule 
encoding GuxA, comprising: 

two or more sequences comprising 9 or more contiguous nucleic acids 
derived from the polypeptide molecule of claim 6. 

25 16. A probe for hybridizing to a polynucleotide encoding GuxA, comprising: 

a sequence of 9 or more contiguous nucleic acids derived from the 
polypeptide molecule of claim 6. 

17. An assay method for the detection of a polynucleotide encoding GuxA, 
30 comprising: 
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amplifying a nucleic acid sequence with a set of amplification primers 
comprising two or more sequences of 9 or more contiguous nucleic acids derived 
from the polypeptide molecule of claim 6; and 

correlating the amplified nucleic acid sequence with detected polynucleotide 
5 encoding GuxA. 

18. A method for assessing the carbohydrate degradation activity of GuxA 
comprising: 

analyzing a carbohydrate degradation in the presence of GuxA and a 
10 carbohydrate degradation in the absence of GuxA on a substrate; and 

comparing the carbohydrate degradation in the presence of GuxA with the 
carbohydrate degradation in the absence of GuxA. 

19. A method for reducing cellulose in a starting material, the method 
15 comprising: 

administering to the starting material an effective amount of a polypeptide 
molecule of claim 6. 

20. The method of claim 1 9, wherein the starting material is agricultural 
20 biomass. 
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Domain structure of <3ux& 




10 
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